Listing of Claims: 



1-75. (Cancelled) 

76. (Currently amended) A method for identifying nucleotides for variation in 
nucleic acids encoding a protein variant library in order to affect impact a desired activity, said 
method comprising: 

(a) receiving data characterizing a training set of a protein variant library, wherein the 
data comprises activity and a nucleotide sequence for each protein variant in the training set; 

(b) from the data, developing a computational algorithmic sequence activity model 
for predicting that predicts activity as a function of independent variables specifying the presence 
or absence of nucleotides at nucleotide types and corresponding position positions in the 
nucleotide sequence; 

(c) using the sequence activity model to rank positions in a reference nucleotide 
sequence and/or nucleotide types at specific positions in the reference nucleotide sequence in 
order of impact on the desired activity; 

(d) using the ranking to identify one or more nucleotides, in the reference nucleotide 
sequence, that are to be varied or fixed in order to impact the desired activity; 

(e) generating a new protein variant library containing one or more new protein 
variants having amino acid sequences encoded by nucleic acids in which the identified 
nucleotides are varied or fixed in order to impact the desired activity; 

(f) assaying the new protein variant library to provide an updated training set 
comprising sequence and activity information for members of the new protein variant library to 
develop a new computational algorithmic sequence activity model; and 

(g) using the new computational algorithmic sequence activity model to identify one 
or more nucleotides in a new reference nucleotide sequence that are to be varied or fixed in order 
to impact the desired activity. 

77. (Previously presented) The method of claim 76, wherein the nucleotides to be 
varied are codons encoding particular amino acids. 

78. (Previously presented) The method of claim 77, wherein the activity is a function 
of expression of nucleic acids. 

79. (Currently amended) A computer program product comprising a tangible 
machine readable storage medium on which is provided program instructions for identifying 
nucleotides for variation in nucleic acids encoding a protein variant library in order to impact 
affect a desired activity, said instructions comprising: 
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(a) code for receiving data characterizing a training set of a protein variant library, 
wherein the data comprises activity and a nucleotide sequence for each protein variant in the 
training set; 

(b) code for using the data to develop developing a computational algorithmic 
sequence activity model for predicting from the data, which sequence activity model predicts 
activity as a function of independent variables specifying the presence or absence of nucleotides 
at nucleotide types and corresponding position positions in the nucleotide sequence; 

(c) code for using the sequence activity model to rank positions in a reference 
nucleotide sequence and/or nucleotide types at specific positions in the reference nucleotide 
sequence in order of impact on the desired activity; 

(d) code for generating a ranked list of the nucleotide positions and/or the nucleotide 
types at specific positions in the reference nucleotide sequence; 

(e) code for using the ranking to identify one or more nucleotides, in the reference 
nucleotide sequence, that are to be varied or fixed in order to impact the desired activity; 

(f) code for receiving activity data characterizing a new protein variant library 
containing one or more protein variants having sequences in which the identified nucleotides 
were varied or fixed in order to impact the desired activity; 

(g) code for using the activity data characterizing the new protein variant library to 
provide an updated training set comprising sequence and activity information for members of the 
new protein variant library to develop a new computational algorithmic sequence activity model; 

(h) code for using the new computational algorithmic sequence activity model to 
identify one or more nucleotides in a new reference nucleotide sequence that are to be varied or 
fixed in order to impact the desired activity; and 

(i) code for outputting information, in a user readable format, identifying members of 
the new protein variant library. 

80. (Previously presented) The computer program product of claim 79, wherein the 
nucleotides to be varied are codons encoding particular amino acids. 

81. (Previously presented) The computer program product of claim 79, wherein the 
activity is a function of expression of nucleic acids. 

82. -101. (Cancelled) 

102. (Currently amended) The method of claim 76 wherein (e) comprises 
expressing the new protein variant library from polynucleotides encoding members of the new 
protein variant library and wherein the polynucleotides are prepared by gene synthesis. 
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103. (Currently amended) The method of claim 76 4W, wherein (e) comprises 
expressing the new protein variant library from polynucleotides encoding members of the new 
protein variant library and wherein the polynucleotides are prepared by mutagenesis. 

104. (Currently amended) The method of claim 76 441, wherein (e) comprises 
expressing the new protein variant library from polynucleotides encoding members of the new 
protein variant library and wherein the polynucleotides are prepared by performing a 
recombination-based diversity generation mechanism. 

105. (Currently amended) The method of claim 76, wherein (f) comprises 1 0 1 , further 
comprising screening the new protein variant library to identify protein variants having the 
desired activity. 

106. (Previously presented) The method of claim 105, further comprising sequencing 
the identified protein variants having the desired activity. 

107. (Previously presented) The method of claim 106, further comprising repeating (a) 
- (c) using the activity and sequence data from protein variants in the new protein variant library. 

108. (Currently amended) The method of claim 76 441, wherein the members of the 
new protein variant library comprise the same amino acid sequence encoded by different 
nucleotide sequences. 

109-119. (Cancelled) 

120. (Currently amended) The method of claim 76, wherein developing the new 
computational algorithmic sequence activity model comprises performing a regression analysis. 

121. (Previously presented) The method of claim 120, wherein the regression analysis is 
based on a partial least square regression. 

122. (Previously presented) The method of claim 120, wherein the regression analysis is 
based on a principal component regression. 

123. (New) The computer program product of claim 79, wherein the code for developing 
the new sequence activity model comprises code for performing a regression analysis. 
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124. (New) The computer program product of claim 123, wherein the code for 
performing the regression analysis comprises code for performing a partial least squares 
regression. 

125. (New) The computer program product of claim 123, wherein the code for 
performing the regression analysis comprises code for performing a principal component 
regression. 
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